Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466) by travispchen · Pull Request #798 · openai/parameter-golf

travispchen · 2026-03-26T02:01:22Z

Order-Adaptive Entropy Gating + BackoffNgramMixer + Drift-Free TTT

val_bpb: 0.5466 (3-seed mean, std 0.0010) | ~15.99 MB | 8×H100 SXM

Adds order-adaptive entropy gating on top of PR #779's BackoffNgramMixer + Drift-Free TTT submission. Instead of using a single entropy center for all n-gram orders, each order gets its own threshold — higher orders are trusted at lower entropy, lower orders only kick in when the model is more uncertain.

Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128)

Seed	step_avg	steps	Pre-TTT bpb	Post-TTT bpb	TTT gain	TTT time	Artifact
1337	99.3ms	5,863	1.1279	0.5478	-0.5801	607s	15,995,959
42	98.3ms	5,863	1.1362	0.5458	-0.5904	606s	15,979,251
2025	99.2ms	5,869	1.1369	0.5463	-0.5906	607s	15,994,227
Mean	98.9ms	5,865	1.1337	0.5466 (std 0.0010)	-0.5871	~607s

What Changed vs PR #779

PR #779 uses a single entropy_center=3.5 for all n-gram orders. We replace this with per-order entropy centers:

# PR #779 (single entropy center for all orders)
alpha = alpha_min + (alpha_max - alpha_min) * sigmoid(2.0 * (entropy - 3.5))

# This submission (per-order entropy centers)
ent_centers = {7: 3.0, 6: 3.2, 5: 3.5, 4: 3.8, 3: 4.2, 2: 4.5}
ent_center = ent_centers[matched_order]
alpha = alpha_min + (alpha_max - alpha_min) * sigmoid(2.0 * (entropy - ent_center))

Higher-order n-grams (7, 6, 5) are trusted at lower model entropy — when the model is fairly confident, the precise n-gram correction refines the prediction. Lower-order n-grams (4, 3, 2) only intervene at higher entropy — when the model is confused enough that even coarse statistics help.

This is an eval-time-only change. It modifies how existing n-gram statistics are combined with neural predictions, not when data enters the cache. The n-gram cache is still updated strictly AFTER scoring each batch (score-first).

Legality

Score-first: N-gram cache updated AFTER scoring each batch. No future tokens leak into predictions.
No oracle selection: Alpha depends only on model entropy and n-gram order, not on ground truth.
Artifact size: All seeds strictly under 16,000,000 bytes (max: 15,995,959).
Training time: Capped at 600s (10 min) on 8×H100 (actual: ~582s).
Eval time: TTT eval ≤607s on 8×H100.

Ablation

Change	Post-TTT bpb	Delta
PR #779 baseline (single entropy center)	0.6713	—
+ Order-adaptive entropy gating	0.5478	-0.1235

Credits

BackoffNgramMixer + Drift-Free TTT + Base model: PR #779
Order-adaptive entropy gating: This submission

…5466, 3-seed mean) Adds order-adaptive entropy gating on top of PR openai#779's BackoffNgramMixer + Drift-Free TTT. Per-order entropy centers replace single threshold: higher n-gram orders trusted at lower entropy. 3-seed validation: 0.5478, 0.5458, 0.5463 (mean 0.5466, std 0.0010). All artifacts strictly under 16,000,000 bytes. Co-Authored-By: Travis Chen <travispchen@gmail.com>

Add per-order entropy centers from PR openai#798 insight: order 7: center=3.0, order 6: 3.2, order 5: 3.5, order 4: 3.8, order 3: 4.2, order 2: 4.5 Higher orders trusted at lower entropy, lower orders only at high uncertainty. Cubric multipliers applied on top. Original X-WING (0.5644) untouched in concepts/xwing/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR openai#798's approach on our engine: per-order entropy centers (7:3.0, 6:3.2, 5:3.5, 4:3.8, 3:4.2, 2:4.5) without cubric. Testing if cubric was hurting when combined with per-order gating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

MatoTeziTanka mentioned this pull request Mar 26, 2026

PROTEUS+STYX — val_bpb 0.8495 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache #769

Open

10 tasks

Robby955 mentioned this pull request Mar 26, 2026

Record: 0.2880 BPB — Complementary Training + Per-Order Multipliers + Distributed Prefill + 15-Gram + EBLS #796

Open

10 tasks

Idan3011 mentioned this pull request Mar 26, 2026

Record: Per-Order Adaptive Alpha + N-gram Backoff (val_bpb=0.2995, 3-seed) #810

Open

quietsmile mentioned this pull request Mar 26, 2026

Record: 0.2873 BPB — Fine-Grained N-gram Cache (65K chunks) #840

Open

6 tasks

callithyia mentioned this pull request Mar 26, 2026

Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT #850

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)#798

Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)#798
travispchen wants to merge 1 commit intoopenai:mainfrom
travispchen:oaeg-backoff-ngram

travispchen commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

travispchen commented Mar 26, 2026

Order-Adaptive Entropy Gating + BackoffNgramMixer + Drift-Free TTT

Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128)

What Changed vs PR #779

Legality

Ablation

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant